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Abstract 



Distributed power control for parallel Gaussian interference channels recently draws great interests. However, 
[ all existing works only studied this problem under deterministic communication channels and required certain 

perfect information to carry out their proposed algorithms. In this paper, we study this problem for stochastic 
£-H ' parallel Gaussian interference channels. In particular, we take into account the randomness of the communication 

■ environment and the estimation errors of the desired information, and thus formulate a stochastic noncooperative 

O ' power control game. We then propose a stochastic distributed learning algorithm SDLA-I to help communication 
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pairs learn the Nash equilibrium. A careful convergence analysis on SDLA-I is provided based on stochastic 
approximation theory and projected dynamic systems approach. We further propose another learning algorithm 
SDLA-II by including a simple iterate averaging idea into SDLA-I to improve algorithmic convergence performance. 
Numerical results are also presented to demonstrate the performance of our algorithms and theoretical results. 

I. Introduction 



The interference channel has long drawn interests from both information theory and communication 
communities [1J. Indeed, the interference channel provides a good model for many communication systems 
from digital subscriber lines to wireless communication systems. Nevertheless, its capacity region is still 
unknown in general even in the Gaussian scenario. Moreover, compared to the flat interference channel, 
fewer works have been done in frequency-selective interference channels. We refer to [j2) for an overview 
on interference channels. 

In this paper we focus on power control in frequency-selective interference channels with Gaussian 
noise, i.e., parallel Gaussian interference channels. It has been shown recently in that obtaining 
globally optimal solution to maximizing the network sum rates is NP-hard in general. Nevertheless, a 
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distributed game-theoretic approach originally proposed in [4] becomes increasingly popular. The key 
assumption is that each individual communication pair is only interested in its own signal and simply 
treats interference as noise when decoding, i.e, not allowing joint encoding/decoding and interference 
cancellation techniques. Obviously, this approach provides an inner bound of the capacity region of parallel 
Gaussian interference channels. More importantly, this strategy is very appealing in practice due to the 
simplicity and distributiveness. Indeed, receivers in current practical communication systems generally 
treat interference as noise though substantial research works have been carried out on interference-aware 
receivers and significant performance gains are promised by multi-user techniques |0. 

After the seminal work 01, different approaches have been applied to study the distributed power control 
in parallel Gaussian interference channels when the channel power gains are deterministic. Specifically, 
10 (El are based on contraction mapping, [|9) is based on piecewise affine mapping, ifTOl resorts 
to variational inequality theory, and [11] formulates an equivalent linear complementary problem. These 
works focused on characterizing the Nash equilibrium (NE) such as existence and uniqueness and de- 
vising distributed algorithms along with convergence analysis. Indeed, the proposed iterative water- filling 
algorithm (IWFA) has become a popular candidate for distributed power control in parallel Gaussian 
interference channels. 

Nevertheless, a common assumption in existing works is that a communication pair is just interested in 
maximizing its immediate transmission rate. Besides, it is assumed that communication channels remain 
unchanged during the algorithmic iterations [0] []6) [|7] [|8] ifTOll ifTTI . However, the communication time 
scale is usually large in common applications such as video transmission in wireless data networks 
lfT2ll . During the whole communication period, it is unlikely that channels would remain the same. In 
these scenarios, a communication pair may be more interested in maximizing its long term transmission 
rate rather than the immediate one. Besides, existing works require the knowledge of exact CSI and/or 
interference levels to be fed back to the corresponding transmitters during the algorithmic iterations. 
Unfortunately, none of these can be easily obtained in practical communication systems if not impossible. 
The convergence results of existing schemes such as IWFA are no longer valid or at least unknown when 
relevant estimation errors exist. 

In this paper we take into account the randomness of the communication environment and estimation 
errors of the desired information. We assume each communication pair is concerned about the long 
term transmission rate, i.e., the expected transmission rate. We first propose a basic stochastic distributed 
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learning algorithm SDLA-I to help distributed communication pairs learn the NE in stochastic transmission 
environments. The desired information in implementing SDLA-I is also allowed to be subject to errors. 
A careful convergence analysis on SDLA-I is also provided based on stochastic approximation theory 
|fT3l lfT4ll and projected dynamic systems (PDS) approach [[T5l . Inspired by the recent developments in 
stochastic approximation theory [[161 [flD . we propose another learning algorithm SDLA-II by including 
a simple iterate averaging idea into the basic learning algorithm SDLA-I to improve the algorithmic 
convergence performance. 

The power control algorithms proposed in this paper belong to the class of stochastic power control 
algorithms. Existing stochastic power control algorithms (see, e.g., lfT8l lfT9l |[20l and references therein) 
cannot be applied to the parallel Gaussian interference channels considered in this paper. Note that the 
recent work [1271 studied the distributed power control for time-varying parallel Gaussian interference 
channels. Nevertheless, the model formulated in [|2T1 is essentially a deterministic one. So IWFA could 
still be applicable in II2T1I . In contrast, as explained in section UTJ it would be extremely difficult and/or 
inconvenient to apply IWFA in our model if not impossible. Besides, ||2"T1 also requires the knowledge of 
exact CSI and interference levels to be fed back to the corresponding transmitters during each iteration 
of power update. 

The rest of this paper is organized as follows. Section [I] describes the specific system model and the 
problem formulation. In section Qlll the basic learning algorithm SDLA-I is described along with a careful 
convergence analysis. The PDS approach is adopted in section [IV] to study the rate of convergence of 
SDLA-I. We further include the idea of iterate averaging and propose SDLA-II in section |V} Section [VI] 
presents some numerical results, and is followed by the conclusions in section |VII[ 



A. System Model 

We consider a scenario consisting of a set of N source-destination pairs indexed by J\f = {1, 2, N}. 
These communication pairs share a common set /C = {1,2, K} of frequency- selective unit-bandwidth 
channels so that their transmissions may interfere with each other. Specifically, the received signal at 
destination j on the k-th channel can be described by the baseband signal model 



II. System Model and Problem Formulation 




(1) 



where denotes the channel coefficient from source i to destination j on the k-th channel, p^ denotes 
the transmission power used by source j on the A;-th channel, Xj denotes the normalized transmission 
symbol of source j on the k-th channel, and denotes the white Gaussian noise with variance n k - at 
destination j on the A;-th channel. 

For later use, we let = |^| 2 . In time-varying communication scenarios, channel coefficients are 
obviously random variables. We denote by G the random vector composed of all the random channel 
power gain coefficients, i.e., G^, \/k e /C, Vj, i E Af. For the sake of greater applicability we shall make no 
assumption on the specific underlying statistical distribution of G. We simply assume that G is bounded 
almost surely and different realizations g's of G are independent and identically distributed (i.i.d.). This 
i.i.d. assumption on G is reasonable in large scale networks. 

We further assume that each user is only interested in its own signal and treats interference as noise. 
Thus, we can write the signal-to-interference-plus-noise-ratio (SINR) at destination j on the k-th channel 
with realization g as 



The corresponding maximum achievable rate Rj for user j is given by Shannon formula [Q] 

K 

R J (p j ,p_ j \g) = Y,Hl + l-), (3) 

k=l 

where pj = [p},p 2 , ■■■,pf] T denotes the power allocation strategy of user j, and p-j denotes the power 
allocation strategies of all the other users. The power allocation strategy of each user should satisfy certain 
constraints. Specifically, pj is regulated by spectral mask constraints, i.e., < Pj < pj, as well as a total 
power constraint, i.e., J2keK.P , j — P T j iax ■ I n order to avoid trivial cases, we assume for all j E Af that 

p k < Vfc e ^ and p max < Y.k^p). 

B. Game Theoretical Formulation 

We now formulate the following noncooperative game to characterize the interaction among the users 
in question: 

G = W, {QjheM, {Rjip^p-j)}^}- (4) 
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In game Q, M is the set of players, i.e., communication pairs. Rj(pj,p_j) is the utility function of user 
j given by 

Rj (Pj , P-j ) = E G [Rj (Pj , P-j \G)], (5) 

where Eg[-] denotes the expected value with respect to G. We shall in the sequel drop the subscript to 
write E[-] instead of Eg[-] when not leading to confusion. Here we implicitly assume that Rj(pj,p_j) 
exists. We further assume Rj(pj,p_j) is continuous with respect to p. $j is the strategy space of user j 
defined as 

= { Pj ER K :J2Pj< P™, < Pj < Pp Vfc G /C}. (6) 

For later use, we denote by $ the product space $x x ... x $ N . 

Due to the uncertainty of channel power gains, player j in stochastic game Q wishes to maximize 
its expected transmission rate Rj by choosing appropriate power allocation strategy pj. Mathematically, 
player j solves the following optimization problem 

maximize Rj (pj ,p~j) 
subject to pj E $j 

where Rj(pj,p_j) and $j are given in © and ©, respectively. Note that this is a stochastic optimization 
problem [|23H . 

We are interested in understanding if and how the players in stochastic game Q can achieve NE, which 
is a widely adopted rational outcome of noncooperative games. We formally define NE of the stochastic 
power control game Q as follows. 

Definition 1. A power allocation profile p* = (p*, p^) is called an NE of the stochastic power control 
game Q if and only if 

p) E arg ms^{Rj{pj,p*_j) : Pj E Vj G M. (7) 

Game Q has been extensively studied when the channel power gains are deterministic. Nevertheless, 
new challenges arise due to the randomness in the channel power gains caused by the stochastic com- 
munication environments. Indeed, player j in stochastic game Q may not even be able to know its utility 
function K[Rj(pj,p-j\G)] due to the following reasons. Firstly, the distribution of G is unknown though 
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Rj(pj,p_j\g) is known. Thus, it is impossible to evaluate ¥,[Rj(pj,p_j\G)] analytically or numerically. 
Indeed, even if the distribution of G was known, it would require player j to obtain global knowledge 
to evaluate E[Rj(pj,p_j\G)], which may result in an unacceptable level of communication overhead. 
Furthermore, even further assuming that player j has the global knowledge about the distribution of 
G, evaluation of E[Rj(pj,p—j\G)] involves multi-dimensional integration and is thus computationally 
expensive. Since player j does not know its exact utility function E[Rj(pj , p-j\G)], it is impossible for 
player j to compute a best response, which is an essential component in IWFA. So IWFA cannot be 
applied to the stochastic game Q investigated in this paper. 

III. Stochastic Algorithm for Learning NE 

A. Stochastic Distributed Learning Algorithm I 

We aim to design a distributed scheme so that an NE of the stochastic game Q can be obtained even 
with so many difficulties described in the previous section. Obviously, such a distributed scheme makes 
sense only when NE exists. Thus, we first address the existence of NE in the following proposition. 

Proposition 1. At least one NE exists in the stochastic power control game Q. 

Proof: See Appendix [A] ■ 
In stochastic communication environments, a desired distributed scheme must offer users time to "learn" 
the environments gradually. Hopefully, an NE can be achieved as users in game Q keep taking adaptive 
strategies during the learning process. Toward this end, we first define as 

k k 

fk _ 9jjPj _ 

E JeA r 9jiPt + n] 

which represents the ratio of the received energy of user j's signal to the total received signal energy at 
destination j on the A>th channel. We let fj = [fj,..., ffY ■ Since Rj(pj,p_j\g) is concave with respect 
to Pj and fj/pj is the associated gradient, we have for any pj e $j 

R j(qj,P-j\9) < Rj(Pj,P-j\g) + (fj/Pj) T (qj -Pj),Vqj e 

Now we are in a position to describe the distributed learning algorithm SDLA-I for game Q to reach 
an NE. We formally summarize SDLA-I in Table \T\ 

In equation ©, fj(n) = (fj(n),...,f- < (n)) T where fj(n) is an approximate estimate of fj(n + 1). 
Thus, receiver j can just locally measure the total received signal energy and extract its own signal energy 
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Algorithm 1 SDLA-I 
Step 1: Initialization: 

Each player j E M starts with an arbitrarily feasible power allocation vector, i.e., Pj(0) E Set 
n := 0. 

Step 2: Computation: 

Each player j G A/" computes Pj(n + 1) by 

Pi (n + 1) = V^[ Pj (n) + aj (n)^-}, (9) 

Pj[Tl) 

where denotes the projection onto with respect to the Euclidean norm, (a J (n))^ =0 is step size 

sequence. 

Step 3: Convergence Verification: 

If stopping criteria are satisfied, then stop; otherwise, set n :— n + 1, and go to Step 2. 



on each subchannel. Then receiver j notifies transmitter j through control channel the corresponding ratio 
vector fj. Note that SDLA-I does not require an exact estimate. Mathematically, 

Rj(Pj,P-j(n)\g(n+l)) <i2i(Pi(n),p_ i (n)|^(n + 1)) 

+ {fj{n)/p j (n)) T (p j - Pj{n)) + e», V Pi G (10) 

where ej > measures the accuracy of the estimation fj. Note that all existing algorithms for distributed 
power control in parallel Gaussian interference channels require the knowledge of exact CSI and/or inter- 
ference level to be fed back to the corresponding transmitters [31 [0 (8) ifTOl 0T] [|2T|. Unfortunately, 
it is hard to obtain perfect knowledge of these information in practical communication systems if not 
impossible. The convergence results on existing schemes such as IWFA are no longer valid or at least 
unknown when relevant estimation errors exist. Thus, as described above, our proposed SDLA-I is more 
robust and requires less communication overhead. Nevertheless, the estimation errors of should not be 
too "bad". We later will formalize the quantitative criteria which specify how exact should be. 

A careful reader may concern about the computation complexity of SDLA-I since each player needs 
to conduct a projection operation during every iteration and projection is in general time-consuming. We 
address this issue through the following proposition which in fact provides a closed form solution for the 
projection operation in (©, implying that SDLA-I can be carried out efficiently. 

Proposition 2. The closed form solution for the projection operation ([9]) is given by 

p%n + 1) = [p%n] + % W^y - \1o , ^ e K, dD 



s 



where [x] b a = max(a, min(x, b)), and Xj > is chosen to satisfy J^keicP^i 71 + •"•) = Pf ax - 

Proof: See Appendix El ■ 

B. Convergence of SDLA-I 

In this subsection, we study the convergence property of SDLA-I. Toward this end, we first introduce 
some further notations for ease of exposition. We denote by D(n) = diag(D 1 (n), ...,D N (n)) the NK x 
iVif-dimensional block diagonal matrix where Dj(n) = diag(aj(ri), ...,Oj(n)) is a K x fT-dimensional 
diagonal matrix with uniform diagonal entry a,j(n). Then the iteration step ® in SDLA-I can be rewritten 
in a compact form given by 

p(n + l)=V*[q(n)], (12) 

where q{n) = p(n) + D{n)^ with f(n) = (fi(n), .., f N (n)) T . Denote by Sj(n) = fj{n)/pj{n), 
Sj(n) = fj{ri)/pj(n) and Sj = M[V Pj Rj(pj,p-j\G)]. We group all the Sj(n)'s, s 3 -(n)'s, and s/s into 
column vectors s(n), s(n), and s, respectively. We further denote by T the N x iV-dimensional matrix 
with [T]ij defined as 



rn 



1 if z = j, 

- max fceX ;(nf n ) lf 1 ^ J- 



9jj n i 

With these notations in mind, the following lemma summarizes some main (in)equalities, which will 
be used in the later proofs of the convergence results of SDLA-I. 

Lemma 1. The following {inequalities hold: 

(i) A power allocation profile p* G $ is an NE of the stochastic game Q if and only if for any a,j > 

P* = ^>* + a,4,VjGA/-, (13) 

where fj = p*Sj(p*). 

(ii) For any R NK , 

II 7>*(p) -7>*(P) ll<ll P-P II • ( 14 ) 



(iii) For any p G R NK and pG$, 

(P-P) T (V*(P)-P)>0. (15) 

Proof: See Appendix O ■ 
The following lemma inspired by [ITOl provides another inequality (fT6l) that will be used later. As a 
byproduct, we also characterize the uniqueness property of NE in game Q with deterministic channel power 
gains in the following lemma. We refer to [0 [fTOl and references therein for a more detail discussion on 
the uniqueness property of NE in deterministic game Q. 

Lemma 2. For given channel power gain realization g, ifTyO (positive definite), then there exists a 
unique NE p*(g) £ $, and 

s(p\g) T (p*(g)-p)>T(s) \\ P -p*(g) ||l,Vpe$, (16) 

with 

<s) = Amm( %, 2 < 17 ) 
max max (k, J 

ieN fce/c 

where A m i n (r) > denotes the minimal eigenvalue of the symmetric part of T, and k\ = (nf + 

Proof: See Appendix ■ 
We now describe the convergence results of SDLA-I in the following theorem. 

Theorem 1. Let T n be the a-field generated by (gr(m),p(m))^ =0 . Assume that: 

(i) T y holds almost surely. 

(ii) The step sizes aj(n) > satisfy: 



y min a,i(n) = +oo, (18) 

n=0 

oo 

J2 a K n ) < +oo,Vz e AT. (19) 



n=0 
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(iii) The estimation errors 6,(ti) > satisfy: 

00 

^2E[a t {n)ei(n) | JT n ] < +00, Vz G jV. (20) 

n=0 

77zen (p(n))^_ generated by SDLA-I converges to the unique NE p* of the stochastic game Q in the 
mean square sense, i.e., Hindoo || p(n) — p* || 2 = almost surely. 

Proof: See Appendix El ■ 
We make the following remarks on the assumptions in Theorem 1 : 

Remark 1. Assumption (i) is the major requirement for the convergence of SDLA-I. A careful thinking 
reveals that this assumption is indeed intuitive. On the one hand, from the game theory point of view, 
r y implies that each player j has a more significant influence on its utility than other players do. 
From the communication point of view, F y imposes upper bounds on the interference received and/or 
caused by communication pair j. Under mild interference conditions, communication pair j 's achievable 
transmission rate is not heavily influenced by other communication pairs. Note that all existing algorithms 
even for deterministic distributed power control such as IWFA require more or less similar conditions to 
ensure convergence Bl Ml/ SB SSH fiEA fiUS BUS. 

Remark 2. Assumption (ii) is quite standard in stochastic approximation algorithms. Indeed, condition 
(18), i.e., YlnLo mm ieA/' a>i{n) — +00, ensures that SDLA-I can cover the entire time axis to reach the 
NE in stochastic parallel Gaussian interference channels. Meanwhile, the choice of step sizes such that 
Yl™=o^[ a 'i{ n )\J~n[ < +00 can asymptotically suppress error variance during the learning process. 

Remark 3. Assumption (iii) provides a quantitative answer to the question on how well the estimation 
f should be. Specifically, Yl'^=o^[ a j( n ) e j( n )\-^ n \ < +°° un P^ es ^ wt total estimation errors can 
be controlled. This assumption is reasonable especially in slow to medium time-varying communication 
environments. Nevertheless, the estimation f may not be good enough in fast time-varying scenarios. In 
this regard, better estimates may be required. Noting that f(n) only utilizes the last feedback f(n), one 
possible solution is to take advantage of empirical distribution after having observed many realization 
f 's. That is, distributed communication pairs gradually learn more about the environment. Thus, better 
estimates f(n) may be obtained. Nevertheless, we do not aim to explore this topic which is beyond the 
scope of this paper. 

To further appreciate how SDLA-I works, let us consider a particular scenario where the difference 
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between fj(n)/pj(n) and K[V Pj Rj(pj(n),p_j(n)\G)] is captured by random vector Oj(n), i.e., 

fj(n)/ Pj (n) = nV^ipiinlp-iin^G)} + Oj(n), Vj e A/" . (21) 

As usual, we group all 0j's into a column vector 0, i.e., = (Ox, 0n) T - In other words, we simply 
use an online estimate fj(n)/pj(n) to approximate K[V Pj Rj(pj(n),p-j(n)\G)] though we are not able 
to evaluate E,[\7 Pj Rj(pj(n),p_j(n)\G)]. The approximation difference is captured by 0j. We will show 
that SDLA-I converges as long as this simple approximation is not too "bad". We will formalize these 
ideas in Theorem 2. Toward this end, we first prove a simple lemma as follows. 

Lemma 3. The mapping s(p) where Sj(p) = W*[V Pj Rj(pj,p-j\G)] is Lipschitz continuous almost surely. 
That is, there exists a positive constant L such that Vp, q G $, 

|| s(p) -s(q) || <L ||p- q \\ (22) 

holds almost surely. 

Proof: See Appendix IB ■ 

Theorem 2. Let Tn be ihe a-field generated by (g(m),p(m))^ =0 . Assume that: 

(i) r >- holds almost surely. 

(ii) The step sizes Qj(ri) > satisfy: 

2t(s) minaj(n) > L 2 maxa^(n) + 5(n), (23) 

where 5(n) is any bounded positive constant. 

(iii) The difference random vector 0(n) satisfy: 

E[6(n)\J n } = 0, (24) 

oo 

^^a 4 2 HE[|| 0i(n) || 2 \F n ] < +oo. (25) 

n=0 igA^ 

r/zen (p(n))^i generated by SDLA-I converges to the unique NE p* of the stochastic game Q in the 
mean square sense, i.e., lim^oo || p(n) — p* || 2 = almost surely. 

Proof: See Appendix iGl ■ 
Note that distributed algorithms based on the gradient projection mapping for deterministic parallel 
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Gaussian interference channels have been proposed in [7]. Nevertheless, the convergence behaviors of 
those algorithms in flT) are only shown for deterministic scenarios and thus cannot be applied to stochastic 
cases. Indeed, Theorem 2 establishes a theoretical foundation for the convergence of those deterministic 
algorithms under stochastic scenarios. The key conditions are included in assumption (iii) in Theorem 2. 
That is, the naive estimate fj(n)/pj(n) for E[V Pj Rj(pj(n),p-j(n)\G)] should not be too "bad" in the 
sense of assumption (iii) in Theorem 2. 

Besides, the requirement (|23l ) imposed on step sizes is also reasonable. Consider a common step size 
choice for every communication pair, i.e., ai(n) = a(n),Vi E J\f. Ignoring the arbitrarily small constant 
<5(n) for ease of exposition, condition (|23|) is then reduced to a{n) < ^^,Vn. That is, larger step size 
can be taken if s is more strongly monotone (i.e., larger t(s)). In contrast, smaller step size should be 
taken if s changes more significantly with respect to p (i.e., larger Lipschitz constant L). 

Though Theorem 2 is of interest in theory, we remark that assumptions in Theorem 2 may not be 
easily verified. For instance, it is hard to know if the difference random vector 6 could satisfy assumption 
(iii) if little is known about the distribution of G in real communication systems. Besides, requirement 
(|23T) imposed on step sizes involves strongly monotone modulus r(s) and Lipschitz constant L, both of 
which depend on the specific channel gain distribution G. In contrast, the step sizes choice in Theorem 
1 is relatively standard. The requirement there is that the total error in the stochastic gradient obtained 
by local communication pair could be properly controlled. This requirement may be easily satisfied when 
the parallel Gaussian interference channels do not change too fast. 

IV. Continuous Time Approximation by PDS 

Note that previous convergence results do not provide insights on the speed of convergence of SDLA-I. 
Indeed, they may be considered as study of the accuracy of SDLA-I. Equally important is the convergence 
rate of SDLA-I. In this section, we shall shed some lights on this question. We note that an exact analysis on 
the convergence rate of SDLA-I is extremely difficult if not impossible due to the various stochastic factors. 
Therefore, we resort to a PDS approach which approximates but still captures the essential behaviors of 
SDLA-I to help us appreciate the convergence speed. Note that a PDS formulation for transient behavior 
analysis for deterministic cognitive radio networks was also briefly described in [|30l . 

To begin with, we recall some basic concepts of PDS from [fT5l to facilitate further discussions. Consider 
a closed convex set K E 1Z M and a vector field J 7 whose domain contains K. Recall Vk. denotes the 
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norm projection. Then define the projection of J- at x as 




V K (x + 8J=)-x 

8 



(26) 



7C 



Now we formally define PDS as follows. 



Definition 2. The following ordinary differential equations 



x(t)=n(x(t),-F(x(t))) 



(27) 



7C 



with an initial value x(0) e JC is called projected dynamical system PDS(!F, K). 

Note that the right hand side in (|27T) is discontinuous on the boundary of 7C due to the projection 
operator, which is different from classical dynamical systems. 

Now let us consider PDS(s,&) given by p(t) = Y[^(p(t), s(p(t))) with initial value p(0) G The 
key results of this PDS are summarized in the following proposition. 

Proposition 3. The PDS(s, $) with initial value p(0) has the following properties: 

(i) It has a unique solution p(t) which continuously depends on the vector field s and initial value p(0); 

(ii) A vector p* e $ is the NE of the stochastic game Q if and only if it is a stationary point of 



(iii) If r >~ holds almost surely, then stationary point p* of PDS(s,§) is unique and globally 
exponentially stable, i.e., \\ p(t) — p* ||<|| p(0) — p* \\ exp(— min s t(s) ■ t) with r(s) given in 



PDS(s, $) is the underlying idealized version of SDLA-I. In other words, we can view SDLA-I 
as a stochastic approximation of PDS(s,$) lfT5l . Thus, the iteration process (f>(ra))^L in SDLA-I 
approximates or tracks the solution p(t) of PDS(s, $). From the above proposition, we know that p(t) 
converges to p* at an exponential rate. Note that the stationary point p* of PDS(s, $) is also the limit 
point of (p(n))^L . So we can expect that p(n) moves in an approximately (subject to the inherent 
stochastic variations) monotone fashion to p* at an exponential rate. This understanding of the iteration 
process in SDLA-I is also instrumental in exploring the idea of iterate averaging, which is detailed in the 
next section. 



PDS(s,<5>), i.e., p*(t) = 0; 



dm 



Proof: See Appendix IH1 
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V. Learning NE with Averaging 

Fast convergence performance of distributed learning algorithm for obtaining an NE of the power control 
game Q is clearly desirable in real communication systems. From previous discussions, we can see that 
the choice of good step sizes aj(n) has a profound effect on the convergence performance of SDLA-I. In 
this section, we discuss how we can improve the convergence performance of SDLA-I so that distributed 
communication pairs can learn the NE in a faster fashion. 

Among various approaches proposed in stochastic approximation theory, the concept of iterate averaging 
reported in [|T6l is an especially appealing and simple way to improve the convergence performance. It 
was shown in [fT6ll that the averaged sequence ^= pH/fJ converges to its limit if step size sequence 
a(n) decays more slowly than O(-) used in the original Robbins-Monro formulation ffT3l . This iterate 
averaging method is optimal in terms of convergence rate. We will take advantage of this appealing 
technique to improve the convergence performance of SDLA-I. 

Using the concept of iterate averaging, we add an averaging operation to the basic recursion © in 
SDLA-I, i.e., 

f.( n ) 

p.(n + i) = V^[ Pj {n) + aj {n) J -±M, 

Pj\ n ) 

Pj(n + 1) = — -j -(npjin) + Pj (n + 1)). (28) 

The stochastic learning algorithm with the above modified recursion will be referred to as SDLA-II. It 
can be shown (p(n))^ =0 generated by SDLA-II converges to the unique NE p* of the stochastic game Q 
in the mean square sense, i.e., lim^oo || p(n) — p* || 2 = almost surely, as long as Oj(n) is a suitable 
decreasing sequence or even fixed step size sequence with small enough value. A detail proof for the 
convergence of SDLA-II can be carried out by following similar arguments as [|T6l and is thus omitted 
here. We instead provide an intuitive exposition on why SDLA-II has faster convergence rate than SDLA- 
I. The idea behind SDLA-II is that we can use larger step size in the basic online recursion for p(n) and 
the increased noise effects due to larger step size can be smoothed out by the offline averaging recursion 
for p(n). As a result, SDLA-II converges faster with larger step size and is less likely to get stuck at the 
first few iterations [[T4l . Indeed, our numerical results demonstrate the convergence rate improvement of 
SDLA-II over SDLA-I. 

A careful reflection on the power allocation trajectory (p(n),p{n))^ =ld generated by SDLA-II may 
reveal a potential handicap in guaranteeing better convergence performance of SDLA-II over SDLA-I. 
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Specifically, with arbitrarily initial starting point p(0) which is quite unlikely near the desired solution of 
NE p*, it is expected that p(n) moves in an approximately monotonic fashion to the NE p* at the early 
stage since the channel power gains of parallel Gaussian interference channels satisfy r y and thus 
the underlying driving force s of the recursion of p(n) is strongly monotone. Nevertheless, the noise due 
to the randomness plays a relatively significant role in the recursion compared to the underlying driving 
force s after sufficient number of iterations. In other words, p(n) starts to hover randomly around the 
NE p* when p(n) is near NE p*. Only at this stage can the iterate averaging p(n) be successful since 
the averaging in this stage can produce a mean solution that is nearer to the NE p*. This implies that 
the communication pairs should transmit with power level p(n) generated by the basic recursion at the 
initial stage, and transmit with power level p(n) generated by averaging after SDLA-II has sufficiently 
converged. 

The above reflection does not imply that SDLA-II is of limited use in practical communication systems. 
Indeed, SDLA-II has few gains in terms of convergence rate over SDLA-I when the initial stage is relatively 
long compared to the communication period. Nevertheless, the communication time scale can be large 
in common applications such as video transmission in wireless data networks lfl"2~l . Thus, SDLA-II will 
yield better estimate of NE p* over SDLA-I in the long run. 

VI. Numerical Results 

We provide some numerical results in this section for illustration purposes. Simulation parameters are 
chosen as follows unless specified otherwise. Inspired by [fTTI and ||30~1 , we set both the number of users 
and number of channels to be 4. The channel power gains are chosen randomly from the intervals 
(g^(l - v),g^(l + v)) with v G {10%, 20%, 30%, 40%, 50%}. Clearly, perturbation parameter v can serve 
as an indicator for the time varying rates of parallel Gaussian interference channels. In particular, larger 
v implies faster channel varying rate. We further let g^ — 15 if i — j and 0.75 otherwise. With this 
choice of simulation parameters, one can verify that T y almost surely if v G {10%, 20%, 30%}. For 
clarity, we relax the spectral constraints, i.e., f\ = +oo,Vi G N,\/k G /C. The total power constraint 
p max = io * i\T = 40, V? G N. Besides, the background noise level n\ = 0.1/ N = 0.025, Vz G A/", VA; G /C. 
We also choose common step size for all users. So we simply write a^n) as a n in this section. 

We first compare our proposed SDLA-I with the popular IWFA. We let users using IWFA have the 
perfect CSI and interference levels at the corresponding transmitters in every power update, while users 
implementing SDLA-I only have stochastic gradients subject to errors. Due to the limited space, we 
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only show the power evolution of user 1 on channel 1 as a function of iteration index in Fig. 1. As 
expected, even with perfect CSI and interference level, the power evolution generated by IWFA fluctuates 
significantly. In contrast, users in SDLA-I are more concerned about the long term transmission rates. 
Consequently, the power evolution only fluctuates mildly after sufficiently long period of learning about 
the environment. Another interesting observation here is that constant step sizes also lead to convergence 
of SDLA-I. Indeed, one can show that (p(n))^L converges to the neighborhood of the unique NE p* 
with the choice of sufficiently small constant step sizes. We omit the details due to limited space. 

Though both constant step size 0(1) and decreasing step size O(^) can lead to the convergence of 
SDLA-I in numerical experiments, we observe that a tradeoff exists between the convergence rate and 
exactness of the converged value, which is evaluated by the standard normalized squared error (NSE) 
defined as 

NSE(n) =|| p{n) - p* || / || p* || . (29) 

The numerical results are shown in Fig. |2] As shown, decreasing step size O(^) has better convergence rate 
than constant step size 0(1) since O(^) goes to very fast and thus new channel power gain realization 
has little effect on power update. However, the solution obtained by decreasing step size is not as exact 
as those by constant step sizes. Nevertheless, an appropriate choice of constant step size is necessary to 
trade off the convergence rate and exactness of the converged value. Indeed, the convergence rate with 
a n = 0.01 is very slow as shown in Fig. |2] Besides, numerical results in Fig. [2] also demonstrate the 
exponential convergence rate predicted by the continuous time approximation using PDS in section [IV] 

We show the impact of time-varying rate in parallel Gaussian interference channels on the convergence 
performance of SDLA-I in Fig. [3] As described, perturbation parameter v can be used to model the time 
varying rate of parallel Gaussian interference channels in our setting. The power evolutions of user 1 on 
channel 1 as a function of iteration index are plotted with different v's in Fig. [3] It is shown that the power 
allocation does not converge when v E {40%, 50%}. Indeed, one can verify that r y can not hold 
almost surely with v E {40%, 50%}. Thus, the convergence of SDLA-I is not guaranteed by Theorem 1. 
Note that T y is also required in one way or another in existing distributed power control algorithms 
including IWFA for deterministic parallel Gaussian interference channels. We in this numerical example 
also observe the importance of condition T y for the power control in stochastic parallel Gaussian 
interference channels. 
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We next show the performance improvement by iterate averaging in terms of convergence rate. In Fig. 
@] users transmit with power level p(n) under SDLA-I. In contrast, under pure SDLA-II, users transmit 
with power level p(n) which is generated by averaging p(n). The mixed SDLA-III in Fig. @] represents a 
transmission scenario, where users transmit with power level p(n) at the first 100 iterations, and afterwards 
transmit with power level p(n) generated by averaging p(n) from the 101-th iteration. As expected, the 
iterate averaging p(n) starts to work after p(n) is near to the NE p*. 

VII. Conclusion 

In this paper, we investigate the distributed power control problem for stochastic parallel Gaussian 
interference channels. We formulate the problem in question as a noncoperative stochastic game Q. 
New challenges arise since users in game Q cannot even know their exact utility functions. With these 
difficulties, we first propose a basic learning algorithm SDLA-I to help users learn the NE in a distributed 
fashion. The convergence property of SDLA-I is carefully analyzed using stochastic approximation theory. 
Besides, we provide a continuous time approximation by PDS to appreciate the convergence speed of 
SDLA-I. Inspired by the recent developments in stochastic approximation theory, we also propose another 
learning algorithms SDLA-II by including a simple iterate averaging idea into SDLA-I. Numerical results 
are provided to demonstrate the theoretical results and algorithms. Since existing works only considered 
deterministic transmission scenarios, our work fills the gap by studying the distributed power control 
problem in stochastic transmission scenarios. 

Appendix A 
Proof of Proposition 1 

Proof: It is obvious that $ is a convex, nonempty, and compact set. Besides, Rj(pj,p_j) is jointly 
continuous by assumption. Noting further that Rj{pj,P—j\g) is concave with respect to pj, we conclude 
that Rj(pj,p_j) = E[Rj(pj,p_j\G)] is also concave with respect to pj since expectation operation 
preserves concavity. The existence of NE thus follows from standard results in game theory [|2"2~1 . ■ 

Appendix B 
Proof of Proposition 2 

Proof: We can prove this proposition by analyzing the well-known Karush-Kuhn-Tucker (KKT) 
conditions in optimization theory 11231 . To begin with, note that the projection operation © is equivalent 
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to the following optimization problem: 

minimize ]- \\ pj(n + 1) - (pj(n) + aj(n) J 1 } ) \\\ 
^ Pj \ n ) 

subject to ^p){n + 1) < pf ax , 

< Pj{n + 1) < pj, VA; G /C. (30) 

This quadratic optimization problem is strictly convex. Therefore, the corresponding solution Pj(n + 1) 
can be obtained from the KKT conditions which are both necessary and sufficient for the optimality f|23l . 
Toward this end, consider the Lagrangian: 

1 f k (n) 
C( Pj (n + 1), \j, Uj , Vj ) =- £(p*(n + 1) - (pj(n) + a 3 (n) J -j^-)) 2 + A^-^pf (n + 1) - pf ax ) 

+ J2 u *(pH n + 1) - p") - E u M(™ + !). 

fce/c kefc 
where \j, Uj = [ttj, ...,m^] t , = [i>j, ...,vf-] T are the associated Lagrangian multipliers. Then the KKT 
conditions are given by 



<± 

U 



mn). 



pj(n + 1) - (p» + «,H^y) + A, + «} - «* 



0,Vfc G /c 



J > 0, p*(n + 1) < pj, uj(pj(n + 1) - p*) = 0, V/c G /C 



u* > 0, p^n + 1) > 0, v)p){n + 1) = 0, V/c G /C 



Aj > 0, J>*(n + 1) < pf ax , XjiJ^Pfa + !) - PT*) = °- ( 32 ) 



fce/c fce/c 
Now for any G /C, we observe that if pk(n + 1) = 0, we have u k = by complementary slackness 
condition. Furthermore, we have — v k = p k An) + a^n)- 3 ^-^ — Aj < 0. By a similar argument, we can 
obtain that p){n + 1) = p)[n) + ^-(n)^^ - Aj if < p)[n + 1) < pj, and pj(n) + aj(n)-^ - Xj > p) 
if Pj(n + 1) = Pj • This completes the proof. ■ 

Appendix C 
Proof of Lemma 1 

Proof: (i). We know that if p* G $ is an NE, then for any > (2H 

P* = P*>; + ajVpMiR^p^p^lG)}}, Vj G A/". (33) 
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The result follows if the interchange of mathematical expectations and gradient signs is justified. Recall 
that the realization g is bounded by assumption. Then it is straightforward to verify || V Pj Rj(pj, p-j\g) \\ 
is also bounded. Thus, V ^[R^p^p^G)} = E[V Pj i2,-(p}, P^G)] (251. 

(ii) . This is a well-known result on the nonexpansive property of projection, the proof of which can be 
found in, e.g., Il24l . 

(iii) . Since V<s>(p) minimizes ~ || p — p \\\ over all p G $, we have 

(P " VMfiV^p) - p) > 0, Vp G R NK , (34) 
by optimality condition 11241 . Noting another obvious fact: 

(Mp) -p) t (V*( p ) -p) > 0,Vp G R NK , (35) 

we conclude that (p — p) T (V$(p) — p) > for any p e R NK and p G $. ■ 

Appendix D 
Proof of Lemma 2 

Proof: Following Proposition 2 in [TTOl . if T y under given g, then 

(s(g|flf) - s(p\g)) T (p -q)> t(s) \\ p - q \\ 2 2 , Vp, q G $, (36) 

with t(s) specified in (fTTT) . That is, s(-|gO is strongly monotone on <3>. The uniqueness of NE p*(g) G $ 
follows (see, e.g., 112610 . 

Furthermore, by the equivalence of standard NE problem and variational inequality (VI {], we have 

s(p*\g) T (p-p*) > 0, Vp G $. (37) 
Substituting p* for q in (l36l) . we obtain 

(s(p*|q)- S (p|q)) T (p-p*)>r( S ) || p-p* f,VpG$. (38) 
Thus, the desired inequality ([TBI immediately follows from (1371) and (|38l) . This completes the proof. ■ 

'Given a set K C R" and a mapping F : K —} R™, the variational inequality VI(K,F) is to find a vector a; G K such that 
(y - x) T F{x) > O.Vv e K (261. 
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Appendix E 
Proof of Theorem 1 

Proof: We first derive a recursion inequality characterizing the relationship between || p(n) — p* \\ 
and || p(n + 1) — p* || in the following lemma. 

Lemma 4. The sequence (p(n))^L generated by iteration 4721) satisfies 

|| p(n + 1) - p* || 2 < || p{n) - p* || 2 +5CV(n) 

+ 2 Oi(n)ei(n) -2^ a^s^n + if (p* - Pi (n)), (39) 

where p* G $ is any NE, C is some large enough constant, and a(ri) = (X^pjv a i ( n ))^- 

Proof: This proof is inspired by constructions from [|27l [|28l . Consider a fixed trajectory (gr(n))^ =0 . 
Recall that q(n) = p(n) + D(n)^| and p{n + 1) = V$[q{n)}. We first have 

f( n ) 

|| p(n + 1) — pin) ||< || — p(n) \\ = \\ p(n) + Din) — p(n) \\ 

pin) 

= || D{n)s{n) ||= (J^a-(n) || s^n) || 2 f < C(^a 2 (n)f = Ca(n), (40) 

where the first inequality follows from Lemma l(ii) and C is some large enough constant. The existence 
of C is guaranteed by the boundedness of s. We proceed by deriving that 

CV(n)+ || pin) - p* || 2 - || p(n + 1) - p* || 2 

> || p(n + 1) - p{n) || 2 + || p{n) - p* || 2 - || p(n + 1) - p* || 2 
=2(p(ri+l)-p(n)) T (p*-p(n)) 

=2(p(n + 1) - <?(n)) T (p* - pin)) + 2( J D(72)s(n)f(p* - p(n)) 

=2(p(n + 1) - qin)) T ip* - q(n)) + 2(p(n + 1) - q(n)) T (q(n) - pin)) + 2(D(n)s(n)) T (p* - p(n)) 
>2(p(n + 1) - q(n)) T (q(n) - p(n)) + 2(£>(ri)s(n)) T (p* - p(n)) 

=2(p(n + 1) - p(^)) T (q(n) - p(n)) + 2(p(n) - g(n)) T (g(n) - p(n)) + 2( J D(n)s(n)) T (p* - p(n)) 

> - 2 || p(rt + 1) - p(n) || || g(ra) - p(n) || -2 || p(n) - g(n) || 2 +2(D(?7,)s(n)) T (p* - p(n)) 

> - 4CV(n) + 2(£>(n)s(n)) T (p* - p(n)) (41) 

where the first inequality follows from (1401 , the second inequality follows from Lemma l(iii), and the 
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last inequality also follows from (140T ). 

Rearranging terms in (1411 ). we proceed as follows: 

|| p(n + 1) - p* || 2 < || p(n) - p* || 2 +5CV(» - 2(D(n)s(n)) T (p* - p(n)) 

= || p(n) - p* f +5CVH - 2 ^(a,(n)s i (n)) T (p* - Pi (n)) 

< II P(n) - P* f +5C 2 a 2 (n) + 2 ^ ai(n)6i(n) 

ieAf 

+ 2^a i (n)(R i (p i (n),p- i (n)\g(n + l)) - i2j(p?,p_i(n)|s(n+ 1))) 

ieAf 

< II P(") - P* f +5C 2 a 2 (ra) + 2 ^ a^e^n) + 2 ^ ^(^^(n + l) T (Pi(n) - p*) 

ieAf ieAf 

(42) 

where the second inequality follows from (flOT ), and the last inequality follows from the concavity of 
■|fl f ) w i m respect to the first argument. This completes the proof. ■ 
We further need the following well-known lemma in stochastic approximation theory [|29l . 

Lemma 5. Let {J^n} be an increasing sequence of a -algebras and e n ,a n , (3 n ,r] n be finite, nonnegative, 
Fn-measurable random variables. If it holds almost surely that Y^m=o a n < oo, Y^=ofin < oo, and 

E(e n+ i|j;) < (1 + + Pn~Vn, (43) 

then (e n )^ =0 converges and Y^=o Vn < 00 almost surely. 

Now taking the conditional expectation Ef-lJvJ of both sides in (|39| ) yields 

E[|| p(n + l)-p* || 2 |J- n ] 

<E[|| p{n)-p* || 2 | JF n ] +E[5CV(n) + 2^a i {n)e i {n)\T n \ - E[2 ^ ai(n) Si (n + l) T (p* - Pi{n)) \ T n ] 

ieAf ieAf 

= || p{n) -p* || 2 +E[5C 2 a 2 H +2^a i (n)e i (n)|.F r J -E[2 J] a i (n)* i (n+ if (p* -p<(n)) | T n \, 

ieAf ieAf 

(44) 

We see (|43l) is satisfied by substituting e„ =|| p(n)—p* || 2 , a n = 0, /3 n = E[5C 2 a 2 (n)+2 £) ieA /-aiO'i)ei(j'i)|.7 r n ], 

and ?? n = E[2 EieA/- a ; W s i(™ + !) T (P* ~ P*( n ))\ ^J- 

Clearly, e n , a n , and /3 n are finite, nonnegative, ^-measurable, and J2^=o a n = < oo. fin < oo 

follows from assumption (ii) and (iii) in Theorem 1. r\ n is also obviously finite, and ^-measurable. The 
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nonnegativeness of r) n follows from Lemma 2 and assumption (i) in Theorem 1. Thus, all the conditions 
in Lemma 5 are satisfied. We conclude that e n —\\ p(n) — p* || 2 converges almost surely, and that 

oo oo 

Vn = E ( 2 Yl a i( n ) S i( n + l ) T &*i - Pi( n ))\Fn] (45) 
n=0 n=0 ieJV 

is finite almost surely. 

We still need to show e n =|| p(n) — p* || 2 converges to almost surely. If this is not true, the event 
A = {w : lim^oo e n (w) = e(w) > 0} has nonzero probability where w is a trajectory on the associated 
probability space. Then for any w G A, there exists a large enough N(w) such that 

oo oo 

^ ai(n)si(n + l) T (p* - pi(n)) >2 mina^n) minr(s) || p* - p(n) || 2 

n=N(w) i&Af n=N(w) 

oo 

> 2 \ min aAn) min r(s) e(w) = +oo (46) 

n=N(w) 

where the first inequality follows from Lemma 2 and the last inequality follows from assumption (ii) 
in Theorem 1. Since (146T ) happens with nonzero probability, the random sum ^ =0 r/ n in (1451) cannot 
be finite almost surely, resulting in a contradiction. Hence, we conclude that (p(n))^ =0 converges to p* 
almost surely. This completes the proof. ■ 

Appendix F 
Proof of Lemma 3 

Proof: Note that we assume that G is bounded almost surely. Given bounded realization g, it is 
straightforward to verify that s(p\g) where sj(p\g) = W Pj Rj(pj,p-j\g) has bounded derivative and thus 
Lipschitz continuous. It follows that s(p) is Lipschitz continuous almost surely. ■ 
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Appendix G 
Proof of Theorem 2 

Proof: The proof essentially follows the same arguments as the proof of Theorem 1. Specifically, we 
first observe that 

E[|| p(n+l)-p* || 2 \JT n ] 
=E[|| P*(p(n) + D(n)s(n)) - V*(p* + D(n)«(p*)) f | T n ] 
<E[|| p(n) - p* + £>(n)(S(n) - s(p*)) || 2 | J^] 
=E[|| p(n) - p* + D(n)(5(p(n)) + 0(n) - s(p*)) f | T n ] 

= II PW-P* || 2 +^a 2 (n) || Si(p(n)) - 5<(p*) || 2 +2 ^ a i (n)(s i (p(n)) - s i (p*)) T (p i (n) - p*) 

+ E[|| D{n)0{n) || 2 | .F n ] + 2(p(n) - p*) T E[D(n)0(n)\ T n ] + 2(*(p(n)) - s(p*)) T E[D 2 (ri)0(n)|jr ] 
= || p(n) - p* || 2 +E[|| D(n)0(ri) || 2 |.F B ] + £ a 2 (n) || 5i (p(n)) - S< (p*) || 2 

+ 2^a i (n)(5 j (p(n)) - s i (p*)) T (p i (n) - p*) 

< || pH-p* || 2 +E[|| D(n)0(ra) || 2 |J-„]+maxa 2 (n) V || ^(p(n)) - ^(p*) || 2 



+ 2^a i (n)(5 j (p(n)) - ^(p*)) T (p,(n) - p*) 
< II PW-P* f +E[|| D(n)0(n) || 2 | JF n ] + L 2 maxa 2 (n) V || ( Pi (n) -p 



II - P*^ 

* l|2 



— 2minaj(n)r(s) II pin) — p 
= II P{n)-P* || 2 + Va 2 (n)E[|| 0;(n) || 2 |J" n ] - (2r(s) minora) - L 2 maxa 2 (n)) || p(n) - p* || 2 . 

(47) 

Here the first equality follows from (TT3T >. The first inequality follows from (fT4l) . The fourth equality follows 
from assumption E[0(n)|J r ri ] = 0. The last inequality follows from Lemma 3, assumption (i) in Theorem 
2, and Lemma 2. 

Substitute e n =|| p(n) - p* || 2 , a n = 0, f3 n = Y,ieM a2 ( n ) E [ll ^( n ) II 2 \^n], and ?7„ = (2r(s) min i6A A 
<Xi(n)—L 2 max ie _v a 2 (n)) || p(n) — p* || 2 . It is straightforward to verify that all the assumptions in Lemma 
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5 are satisfied. We conclude that e„ =|| p(n) — p* || 2 converges almost surely, and that 

oo oo 

y^Vn = y^(2r(s) mm ai(n) - L 2 max a 2 (n)) || p(n) - p* f (48) 

n=Q n=0 

is finite almost surely. 

We further claim that e n = \\ p(n) —p* \\ 2 converges to almost surely. Observe that 2r(s) mm i€ fj a^n) 
— L 2 max i€ j^a 2 (n) is bounded away from by assumption (ii), this claim holds by following a similar 
argument by contrapositive as that of the proof for Theorem 1 . ■ 

Appendix H 
Proof of Proposition 3 

Proof: (i). Note that s(p) is Lipschitz continuous by Lemma 3. Hence PDS(s, $) is well posed and 
the results follow from Theorem 2.5 in [fl"5l . 

(ii) . The equivalence of NE in game Q and the set of stationary points in PDS can be shown by observing 
that $ is convex polyhedron by following [fT3Tl . We provide a sketch of the proof here for completeness. 
Define a variational inequality problem VI(s, <£>), the aim of which is to find a vector p* such that 

(p-p*) T s(p*) < 0,Vp E (49) 

It is known that p* is a solution to VI(s, $) if and only if it is an NE of the game Q (see, e.g., ||26*1 ). 
Noting that $ is convex polyhedron, the stationary points of PDS(s, $) coincide with the solutions of 
VI(s, $) by Theorem 2.4 in lf!31 . 

(iii) . Recall the condition that r y holds almost surely implies the strongly monotonicity of s(p). 
Then the uniqueness and globally exponential stability follow from Theorem 3.7 in [fT31 . Indeed, we can 
associate a Liapunov function || p — p* \\ for PDS(s, $) to obtain the stability result. ■ 
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Fig. 1. Comparison of IWFA and SDLA-I: In SDLA-I, v = 20% and a„ = 0.5. 




Impact of Step Sizes: v = 20%. 
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Fig. 3. Impact of Time- Varying Rate: a n = 0.1. 
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Fig. 4. Learning NE with Averaging - SDLA-II: v = 30% and a n — 0.5. 



